perm filename PATREC[4,KMC]3 blob
sn#085745 filedate 1974-02-05 generic text, type T, neo UTF8
00100 AN ALGORITHM WHICH RECOGNIZES NATURAL LANGUAGE
00200 DIALOGUE EXPRESSIONS
00300
00400
00500
00600 COLBY AND PARKISON
00700
00800 OUTLINE
00900 INTRODUCTORY -Discussion of language as code, other approaches
01000 sentence versus word dictionary using projection
01100 rules to yield an interpretation from word definitions.
01200 experience with old Parry.
01300 PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400 Preprocessing- dict words only
01500 translations
01600
01700 (ROGER- WHAT HAPPENED TO FILLERS?)
01800 contractions
01900 expansions
02000 synonyms
02100 negation
02200 Segmenting - prepositions, wh-words, meta-verbs
02300 give list
02400 Matching - simple and compound patterns
02500 association with semantic functions
02600 first coarsening - drop fillers- give list
02700 second coarsening - drop one word at a time
02800 dangers of insertion and restoration
02900 Recycle condition- sometimes a pattern containing pronouns
03000 is matched, like "DO YOU AVOID THEM". If THEM could be
03100 a number of different things and Parry's answer depends on
03200 which one it is, then the current value of the anaphora,
03300 THEM, is substituted for THEM and the resulting pattern
03400 is looked up. Hopefully, this will produce a match to a
03500 more specific pattern, like "DO YOU AVOID MAFIA".
03600 default condition - pass surface to memory
03700 change topic or level
03800 Advantages - real-time performance, pragmatic adequacy and
03900 effectiveness, performance measures.
04000 "learning" by adding patterns
04100 PARRY1 ignored word order- penalty too great
04200 PARRY1 too sequential taking first pattern it found
04300 rather than looking at whole input and then deciding.
04400 PARRY1 had patterns strung out throughout procedures
04500 and thus cumbersome for programmer to see what patterns were.
04600 Limitations - typical failures, possible remedies
04700 NO NOUNS, ETC.- NO MORE GENERAL RULES LIKE NOUN PHRASE,
04800 AMBIGUITIES SLIDE THROUGH WHEREAS PARSER OWULD CATCH THEM
04900 Summary
05000
05100 INTRODUCTION
05200
05300 To recognize is to identify something as an instance of the
05400 "same again". This familiarity is possible because of recurrent
05500 characteristics of the world which repeat themsleves over and over
05600 again. We shall describe an algorithm which recognizes recurrent
05700 characteristics of natural language dialogue expressions through a
05800 multi-stage sequence of functions which progressively transforms
05900 these input expressions into a pattern which eventually best matches
06000 a more abstract stored pattern. The name of the stored pattern has a
06100 pointer to the name of a response function which decides what to do
06200 once the input has been characterized. Here we shall discuss
06300 only the recognizing functions, except for one response function
06400 (anaphoric substitution) which interactively aids the
06500 characterization process. How the response functions operate will be
06600 described in a future communication (Faught, Colby, Parkison).
06700 In constructing and testing a simulation of paranoid
06800 processes, we were faced with the problem of reproducing paranoid
06900 linguistic behavior in a diagnostic psychiatric interview. The
07000 diagnosis of paranoid states, reactions or modes is made by
07100 clinicians who judge a degree of correspondence between what they
07200 observe linguistically in an interview and their conceptual model of
07300 paranoid behavior. There exists a high degree of agreement about this
07400 conceptual model which relies mainly on what an interviewee says and
07500 how he says it.
07600 Natural language is a life-expressing code people use for
07700 communication with themselves and others. In a real-life dialogue
07800 such as a psychiatric interview, the participants have interests,
07900 intentions, and expectations which are revealed in their linguistic
08000 expressions. To produce effects on an interviewer which he would
08100 judge similar to the effects produced by a paranoid patient , an
08200 interactive simulation of a paranoid patient must be able to
08300 demonstrate typical paranoid interview behavior. To achieve the
08400 desired effects, the paranoid model must have the ability to deal
08500 with the linguistic behavior of the interviewer.
08600 There are a number of approaches one might consider for an
08700 ideal handling of dialogue expressions. One approach would be
08800 to construct a dictionary of all expressions which could possibly
08900 arise in an interview. Associated with each expression would be its
09000 interpretation, depending on dialogue context, which in turn must
09100 somehow be defined. For obvious reasons, no one takes this approach
09200 seriously. Instead of an expression dictionary, one might
09300 construct a word dictionary and then use projection rules to yield an
09400 interpretation of a sentence from the dictionary definitions. This,
09500 has been the approach adopted by conventional linguistic parsers.
09600 Such a method performs adequately as long as the dictionary involves
09700 only a few hundred words, each word having only one or two senses,
09800 and the dialogue is limited to a mini-world of a few objects and
09900 relations. But the problems which arise in a psychiatric interview
10000 conducted in unrestricted English are too great for this method to be
10100 useful for real-time dialogues in which immediacy of response is an
10200 important requirement of efficiency.
10300 There is little consensus knowledge about how humans process
10400 natural language. They can be shown to possess some knowledge of
10500 grammar rules but this does not entail that they use a grammar in
10600 interpreting and producing language. Irregular verb-tenses and
10700 noun-plurals do not follow rules; yet people use thousands of them.
10800 One school of linguistics believes that people possess full
10900 transformational grammars for processing language. In our view this
11000 position seems dubious. Originally transformational grammars were
11100 not designed to "understand" a large subset of English; they
11200 represented a set of axioms for deciding whether a string is
11300 "grammatical". Efforts to use them for other purposes have not been
11400 fruitful.
11500 An analysis of what one's problem actually is should guide
11600 the selection or invention of methods appropriate to that problem's
11700 solution. Our problem was not to develop a consistent and general
11800 theory of language nor to assert empirically testable hypotheses
11900 about how people process language. Our problem was to write an
12000 algorithm which recognizes what is being said in a dialogue and what
12100 is being said about it in order to make a response such that a sample
12200 of I-O pairs from the paranoid model is judged similar to a sample of
12300 I-O pairs from paranoid patients. From the perspective of
12400 artificial intelligence as an attempt to get computers to perform
12500 human intellectual tasks, methods had to be devised for the task of
12600 participating in a human dialogue in a paranoid-patient-like way. We
12700 are not making an existence claim that our strategy represents the
12800 way people process language. We sought efficacious methods which
12900 could operate efficiently in real time. We would not deny that our
13000 methods for this task are possibly analogous to the methods humans
13100 use. And since our methods provide a general way of mapping from many
13200 "surface" expressions to a single stored pattern, these methods are
13300 useful for any type of dialogue algorithm.
13400 To perform the task of managing communicative uses and
13500 effects of natural language, we adopted a strategy of transforming
13600 the input until a pattern is achieved which matches completely or
13700 partially a more abstract stored pattern. This strategy has proved
13800 adequate for our purposes a satisfactory percentage of the time. (No
13900 one expects an algorithm to be successful 100% of the time since not
14000 even humans, the best natural language systems around, achieve this
14100 level of performance). The power of this method for natural language
14200 dialogues lies in its ability to ignore unrecognizable expressions
14300 and irrelevant details. A conventional parser doing word-by-word
14400 analysis fails when it cannot find one or more of the input words in
14500 its dictionary. It is too fragile for a dialogue in that it must know
14600 every word; it cannot guess.
14700 In early versions of the paranoid model, (PARRY1), many (but
14800 not all) of the pattern recognition mechanisms were weak because they
14900 allowed the elements of the pattern to be order independent. For
15000 example, consider the following expressions:
15100 (1) WHERE DO YOU WORK?
15200 (2) WHAT SORT OF WORK DO YOU DO?
15300 (3) WHAT IS YOUR OCCUPATION?
15400 (4) WHAT DO YOU DO FOR A LIVING?
15500 (5) WHERE ARE YOU EMPLOYED?
15600 In PARRY1 a procedure would scan these expressions looking for an
15700 information-bearing contentive such as "work", "for a living", etc.
15800 If it found such a contentive along with a "you" or "your" in the
15900 expression, regardless of word order, it would respond to the
16000 expression as if it were a question about the nature of one's work.
16100 (There is some doubt this even qualifies as a pattern since
16200 interrelations between words are ignored and only their presence is
16300 considered). An insensitivity to word order has the advantage that
16400 lexical items representing different parts of speech can represent
16500 the same concept,e.g. "work" as noun or as verb. But a price is paid
16600 for this resilience and elasticity. We found from experience that,
16700 since English relies heavily on word order to convey the meaning of
16800 it messages, the penalty of misunderstanding (to be distinguished
16900 from ununderdstanding), was too great. Hence in PARRY2 , as will be
17000 described shortly, all the patterns require a specified word order.
17100 It seems agreed that for high-complexity problems it is
17200 useful to have constraints. Diagnostic psychiatric interviews
17300 (and especially those conducted over teletypes) have several natural
17400 constraints. First, clinicians are trained to ask certain questions
17500 in certain ways. These stereotypes can be treated as special cases.
17600 Second, only a few hundred standard topics are brought up by
17700 interviewers who are trained to use everyday expressions and
17800 especially those used by the patient himself. When the interview is
17900 conducted over teletypes, expressions tend to be shortened since the
18000 interviewer tries to increase the information transmission rate over
18100 the slow channel of a teletype. (It is said that short expressions
18200 are more grammatical but think about the phrase "Now now, there
18300 there.") Finally, teletyped interviews represent written speech.
18400 Speech is known to be 50% redundant; hence many unrecognized words
18500 can be ignored without losing the meaning of the message. Written
18600 speech is loaded with idioms, cliches, pat phrases, etc. - all
18700 being easy prey for a pattern-recognition approach. It is futile to
18800 try to decode an idiom by analyzing the meanings of its individual
18900 words. One knows what an idiom means or one does not.
19000 We shall now describe the pattern recognition functions of the
19100 algorithm.
19200
19300 PREPROCESSING
19400
19500 Each word in the input expression is first looked up in a
19600 dictionary of (1300) words which remains in core in machine language.
19700 The dictionary consists of a list of words and names of word-classes
19800 they can be translated into. The size of the dictionary is determined
19900 by the patterns, i.e. each dictionary word appears in one or more
20000 patterns. (ROGER- SHOW PIECE OF DICT?) Words in the dictionary
20100 reflect PARRY2's main interests. If a word in the input is not in the
20200 dictionary, it is dropped from the pattern being formed. Thus if the
20300 input were:
20400 WHAT IS YOUR CURRENT OCCUPATION?
20500 and the word "current" is not in the dictionary, the pattern at this
20600 stage becomes:
20700 (WHAT IS YOUR OCCUPATION)
20800 The question-mark is thrown away as redundant since questions are
20900 recognized by word order. (A statement followed by a question mark
21000 (YOU GAMBLE?) is considered to be communicatively equivalent in its
21100 effects to that statement followed by a period.) Synonymic
21200 translations of words are made so that the pattern becomes, for
21300 example:
21400 (WHAT BE YOU JOB)
21500 Groups of words are translated into a single word-class name, so
21600 that, for example, "for a living" becomes "job".
21700 Misspellings are also handled in the dictionary by
21800 simply rewriting a recognized misspelling into its correct form.Thus
21900 "yyou" becomes "you". The common misspellings were gathered from over
22000 4000 interviews with versions of the paranoid model. Other
22100 misspellings do not appear in the pattern because they are not
22200 represented in the dictionary.
22300 Certain juxtaposed words are contracted into a single
22400 word,e.g. "GET ALONG WITH" becomes "GETALONGWITH". This is done (1)
22500 to deal with groups of words which are represented as a single
22600 element in the stored pattern, and (2) to prevent segmentation from
22700 occurring at the wrong places, such as at a preposition inside an
22800 idiom. Besides these contractions, certain expansions are made so
22900 that for example, "DON'T" becomes "DO NOT" and "I'D" becomes "I
23000 WOULD".
23100 SEGMENTING
23200
23300 Another weakness in the crude pattern matching of PARRY1 was
23400 that it took the entire input expression as its basic processing
23500 unit. Hence if only two words were recognized in an eight word input,
23600 the risk of misunderstanding was great. We needed a way of dealing
23700 with units shorter than the entire input expression.
23800 Expert telegraphists stay six to twelve words behind a
23900 received message before transcribing it. (Bryan and Harter, 1897).
24000 Translators wait until they have heard 4-6 words before they begin
24100 translating. Aided by a heuristic from machine-translation work by
24200 Wilks ( ), we devised a way of bracketing the pattern constructed up
24300 to this point into shorter segments using the list of words in Fig.1.
24400 The new pattern formed is termed either "simple", having no
24500 delimiters within it, or "compound", i.e.being made up of two or more
24600 simple patterns. A simple pattern might be:
24700 ( WHAT BE YOU JOB )
24800 whereas a compound pattern would be:
24900 (( WHY BE YOU ) ( IN HOSPITAL ))
25000 Our experience with this method of segmentation shows that compound
25100 patterns from psychiatric dialogues rarely consist of more than three
25200 or four fragments.
25300 After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
25400 to replace the commonly omitted "THAT", such that:
25500 ( I THINK YOU BE AFRAID )
25600 becomes
25700 (( I THINK ) ( YOU BE AFRAID ))
25800
25900 PREPARATION FOR MATCHING
26000
26100 Conjunctions serve only as markers for the segmenter and they
26200 are dropped out after segmentation.
26300 Negations are handled by extracting the "NOT" from the
26400 pattern and assigning a value to a global variable which indicates to
26500 the algorithm that the expression is negative in form. When a pattern
26600 is finally matched, this variable is consulted. Some patterns have a
26700 pointer to a pattern of opposite meaning if a "NOT" could reverse
26800 their meanings. If this pointer is present and a "NOT" is found,
26900 then the pattern matched is replaced by its opposite, e.g. ( I not
27000 trust you ) is replaced by the pattern ( I mistrust you ). We have
27100 not yet observed the troublesome case of "he gave me not one but two
27200 messages". (There is no need to scratch where it doesn't itch).
27300
27400 MATCHING AND RECYCLING
27500
27600 The algorithm now attempts to match the segmented patterns
27700 with stored patterns which are currently about 2000 in number.
27800 First a complete and perfect match is sought. When a match is found,
27900 the stored pattern name has a pointer to the name of a response
28000 function which decides what to do further. If a match is not found,
28100 further transformations of the pattern are carried out and a "fuzzy"
28200 match is tried.
28300 For fuzzy matching at this stage, the elements in the pattern
28400 are dropped one at a time and a match attempted each time. This
28500 allows ignoring familiar words in unfamiliar contexts. For example,
28600 "well" is important in "Are you well?" but meaningless in "Well are
28700 you?".
28800 Deleting one element at a time results in, for example,the
28900 pattern:
29000 ( what be you main problem )
29100 becoming successively:
29200 (a) ( be you main problem )
29300 (b) ( what you main problem )
29400 (c) ( what be main problem )
29500 (d) ( what be you problem )
29600 (e) ( what be you main )
29700 Since the stored pattern in this case matches (d), (e) would not be
29800 constructed. We found it unwise to delete more than one element
29900 since our segmentation method usually yields segments containing a
30000 small (1-4) number of words.
30100 Dropping an element at a time provides a probalility
30200 threshold for partial matching which is a function of the length of
30300 the segment. If a segment consists of five elements, four of the five
30400 must be present in a particular order (with the fifth element missing
30500 in any position) for a match to occur. If a segment contains four
30600 elements, three must match - and so forth.
30700 The transformations described above result in a progressive
30800
30900 coarsening of the patterns by deletion. Substitutions are also made
31000 in certain cases. Some patterns contain pronouns which could stand
31100 for a number of different things of importance to PARRY2. The
31200 pattern:
31300 ( DO YOU AVOID THEM )
31400 could refer to the Mafia, or racetracks, or other patients. When
31500 such a pattern is recognized, the pronoun is replaced by its current
31600 anaphoric value as determined by the response functions, and a more
31700 specific pattern such as:
31800 ( DO YOU AVOID MAFIA )
31900 is looked up. In many cases, the meaning of a pattern containing a
32000 pronoun is clear without any substitution. In the pattern:
32100 (( HOW DO THEY TREAT YOU ) ( IN HOSPITAL ))
32200 the meaning of THEY is clarified by ( IN HOSPITAL ).
32300
32400 COMPOUND-PATTERN MATCH
32500
32600 When more than one simple pattern is detected in the input, a
32700 second matching is attempted. The methods used are similar to the
32800 first matching except they occur at the segment level rather than at
32900 the single element level. Certain patterns, such as ( HELLO ) and ( I
33000 THINK ), are dropped because they are considered meaningless. If a
33100 complete match is not found, then simple patterns are dropped, one at
33200 a time, from the complex pattern. This allows the input,
33300 (( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
33400 to match the stored pattern,
33500 (( HOW DO YOU COME ) ( IN HOSPITAL )).
33600
33700 If no match can be found at this point, the algorithm has
33800 arrived at a default condition and the appropriate response functions
33900 decide what to do. For example, in a default condition, the model
34000 may assume control of the interview, asking the interviewer a
34100 question, continuing with the topic under discussion or introducing a
34200 new topic.
34300
34400 ADVANTAGES AND LIMITATIONS
34500
34600 As mentioned, one of the main advantages of a
34700 characterization strategy is that it can ignore as irrelevant what it
34800 does NOT recognize. There are several million words in English,
34900 each possessing one to one hundred senses. To construct a
35000 machine-usable word dictionary of this magnitude is out of the
35100 question at this time. Recognition of natural language input such
35200 as described above, allows real-time interaction in a dialogue since
35300 it avoids becoming ensnarled in combinatorial disambiguations and
35400 long chains of inferencing which would slow a dialogue algorithm down
35500 to impracticality, if it could even function at all. The price paid
35600 for pattern matching is that sometimes, but rarely, ambiguities slip
35700 through.
35800 A drawback to PARRY1 was that it reacted to the first pattern
35900 it found in the input rather than characterizing the input as fully
36000 as possible and then deciding what to do based on a number of tests.
36100 Another practical difficulty with PARRY1 from a programmer's
36200 viewpoint, was that elements of the patterns were strung out in
36300 various procedures throughout the algorithm. It was often a
36400 considerable chore for the programmer to determine whether a given
36500 pattern was present and precisely where it was. In the
36600 above-described method, the patterns are all collected in one part of
36700 the data-base where they can easily be examined.
36800 Concentrating all the patterns in the data base gives PARRY2
36900 a limited "learning" ability. When an input fails to match any
37000 stored pattern or matches an incorrect one, as judged by a human
37100 operator, a pattern matching the input can be put into the data-base
37200 automatically. If the new pattern has the same meaning as a
37300 previously stored pattern, the human operator must provide the name
37400 of the appropriate response function. If he doesn't remember the
37500 name, he may try to rephrase the input in a form recognizable to
37600 PARRY2 and it will name the response function associated with the
37700 rephrasing. These mechanisms are not "learning" in the commonly used
37800 sense but they do allow a person to transfer his knowledge into
37900 PARRY2's data-base with very little redundant effort.
38000 PERFORMANCE
38100 We have a number of performance measures on PARRY1 along a
38200 number of dimensions including "linguistic non-comprehension". That
38300 is, judges estimated PARRY1's abilities along this dimension on a 0-9
38400 scale. They also rated human patients and a "random" version of
38500 PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND DISCUSS). We have
38600 collected ratings of PARRY2 along this dimension to determine if the
38700 characterization process represents an improvement over PARRY1.
38800 (FRANK AND KEN EXPERIMENT).